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Abstract — The standard algebraic decoding algo- 
rithm of cyclic codes [n,k,d] up to the BCH bound 
5 = 2t + 1 is very efficient and practical for relatively 
small n while it becomes unpractical for large n as 
its computational complexity is 0(nt). Aim of this 
paper is to show how to make this algebraic decoding 
computationally more efficient: in the case of binary 
codes, for example, the complexity of the syndrome 
computation drops from 0{nt) to 0{t^/n), while the 
average complexity of the error location drops from 
0{nt) to max{0(tv^),0(tlog2(t)loglog(t)log(n))}. 
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I. Introduction 

The algebraic decoding of cyclic codes up to the 
BCH bound, as obtained early in the sixties with 
the contribution of many people, was considered 
very efficient for the needs of that time ([1], [3], 
[18], [19], [23], [24]). However, today we can and 
need to manage error correcting codes of sizes that 
require more efficient algorithms, possibly at the 
limit of their theoretical minimum complexity. We 
are proposing here an algorithm that goes in this 
direction. 

Although we will focus as our main point of ref- 
erence and comparison on the classical algebraic 
decoding, there are other decoding algorithms that 
have been recently proposed and that we limit 
ourselves to cite here as a reference, e.g. [10], [13], 
[21], [22]. 

Let us summarize now the standard algebraic de- 
coding of cyclic codes: let C be an [n, k, d] cyclic 



code over a finite field ¥q, q = for a prime p, with 
generator polynomial of minimal degree r = n — k 



9{x) 



9ix 



+ • • • + 9r-lX + Qr 



g{x) dividing 2:" — 1, and let a be a primitive n-th 
root of unity lying in a finite field Fpm, where the 
extension degree is the minimum integer m such 
that n is a divisor of — 1 . Assuming that C has 
BCH bound 5 = 2t + 1 (if (5 is even, we would 
just consider 5—1), then g{x) has 2t roots with 
consecutive power exponents, so that the whole set 
of roots is 



m = {a 



i+1 



S2t + 1 



where it is not restrictive to take £ — as it is 
usually done. 

Let R{x) — g{x)I{x) + e{x) be a received code word 
such that the error pattern e{x) has no more than 
t nonzero coefficients. The Gorenstein-Peterson- 
Zierler decoding procedure ([18], [23]), which is a 
standard decoding procedure for every cyclic code 
up to the BCH boimd, is made up of four steps: 

• Computation of 2t syndromes: Sj = R{a^), j — 
l,...,2t. 

• Computation of the error-locator polynomial 



o-(z) = atz + (Jt-iz 



+ (Jiz + 1 (we 



are assuming the case that exactly t errors 
occurred; if there are < t errors, this step 
would output a polynomial of degree tg). 

• Computation of the roots of a{z) in the form 
a"^'', h = 1, . . . ,t, yielding the error positions 
jh- 

• Computation of the error magnitudes. 
Efficient implementations of this decoding algo- 
rithm combine the computation of 2t syndromes 
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using Horner's rule, the Berlekamp-Massey algo- 
rithm to obtain the error-locator pol5momial, the 
Chien search to locate the errors, and the evalu- 
ation of Forney's polynomial to estimate the error 

magnitudes. 

The computation of the 2t syndromes using 
Homer's rule requires 2tn multiplications in Fp^, 
which may be prohibitive when n is large. The 
Berlekamp-Massey algorithm has multiplicative 
complexity 0{t^) ([3], [14]), is very efficient and 
will not be discussed further later on. The Chien 
search requires again 0{tn) multiplications in F^™ 
and Forney's algorithm 0{t^) ([14]). Notice that 
this fourth step is not required if we deal with 
binary codes and that both the first and the fourth 
steps consist primarily in polynomial evaluations, 
so they can benefit from any efficient polynomial 
evaluation algorithm, as we will show. 
The standard decoding procedure is satisfactory 
when the code length n is not too large (say < 10^) 
and efficient implementations are set up taking 
advantage of the particular structure of the code. 
The situation changes dramatically when n is of 
the order of 10® or larger. In this case a complexity 
0{tn), required by the syndrome evaluations and 
by the Chien search, is not acceptable anymore. 
This paper describes some methods to make these 
steps more efficient and practical even for large n. 
We will follow the usual approach of focusing as 
above in computing the number of multiplications, 
as they are more expensive than sums (see also [6]). 
The paper is structured as follows: Section II con- 
cerns the computation of syndromes. Section III 
deals with the computation of the roots of the error- 
locator polynomial as well as the corresponding 
error positions; the error locator pol5momial is sup- 
posed to be given (being computed by Berlekamp- 
Massey algorithm). Finally, Section IV gives a nu- 
merical example illustrating the whole procedure. 

II. Syndrome Evaluation 

Let P be any element of 91, the standard Homer's 
rule ([15], [16]) allows us to compute R{f3) in at 
most n products, thus for the computation of 2t 
syndromes we have the estimate 0{tn). However, 
in [7], [26] we showed that polynomials over a finite 
field of characteristic p can be evaluated more effi- 
ciently by exploiting the Frobenius automorphism, 
i.e. the mapping cr(/3) = 13^, with a significant 
computational cost reduction. 



Briefly, to evaluate a polynomial r{x) of degree n 
over Fps, in 3, an element of ¥pm, we write r(x) as 
a linear combination of s polynomials ri(x) over ¥p 

r(x) = ro{x) + jri{x) ^ h 7^"Vs_i(a;) , 

where {1,7,..., 7"^^} is a basis for ¥ps. Thus r(/3) is 
obtained as a linear combination of s field elements 
ri{P). To evaluate a poljmomial R{x) over Fp in /3, 
one can profit by writing 

R{x) = Ri,o{x'')+xRi,i{xP) + - ■ ■+xP-^Ri,p-i{xP) , 

where Rifi{xP) collects the powers of x with expo- 
nent a multiple of p and in general x^Ri^i^x^) col- 
lects the powers of the form a;"^'+*, with a e N and 
< i < p— 1. Thus R{f3) can be computed by evalu- 
ating pp, then computing every R\^i{j3P) and finally 
computing the linear combination. This procedure 
requires, for example, nearly n/2 multiplications in 
the binary case, but the further advantage is that it 
can be iterated. After L steps, we need to evaluate 
polynomials RlA.^) of degree at most L^J- By 
a convenient number L of iterations, and with a 
smart arrangement of the multiplications ([7]), one 
can achieve an overall complexity of approximately 
2s^n{p — 1). In the particular case of binary codes, 
the complexity is 2^/n. 

It should be remarked that in hardware implemen- 
tations, the proposed algorithm allows a strong 
parallelism, while Horner's mle is inherently se- 
rial. In fact, if L is the number of iterations, the 
evaluation of the p^ poljmomials Rl^x) can be 
done in parallel. Moreover an additional gain may 
be given by the pre-computation of the powers of 
/3, especially when the number of syndromes to be 
computed is big. Furthermore, like in Horner's rule, 
multiplication by /3 or its powers can be performed 
using Linear Feedback Shift Registers ([12], [18], 
[20]) with a further speed up at a very small cost, 
while the p-power operations would benefit from 
the use of a normal basis ([15], [17]). 

Lastly, it should be also remarked that, in particular 
situations, a better cost reduction can be obtained 
by means of a different use of the Frobenius au- 
tomorphism and a careful choice of the number 
of iterations. As an example, in [26] we described 
a method of computing the syndromes for the 
famous Reed-Solomon code [255,223,33] over F28 
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used by NASA ([29]), that employs 6735 multi- 
plications to evaluate 32 syndromes, versus 8159 
multiplications that are necessary using Horner's 
rule. The direct application of the method outlined 
above would not be convenient in this situation 
because of the particular parameters involved. 

111. Roots of the error-locator polynomial 

Once the error locator pol5momial a{z) is com- 
puted from the syndromes using the Berlekamp- 
Massey algorithm, its roots, represented in the 
form a^^', correspond to the error positions £i, 
i — 1, . . . ,t, which are generally found by testing 
cr(Q:~') for all n possible powers a"' with an al- 
gorithm usually referred to as the Chien search. In 
this approach, if a{a~^) = an error in position 
j is recognized, otherwise the position is correct. 
However, this simple mechanism can be unaccept- 
ably slow when n is large since its complexity is 
0{tn): aim of this Section is to describe a less costly 
procedure. 

The Cantor-Zassenhaus probabilistic factorization 
algorithm ([4]) is very efficient in factoring a poly- 
nomial and consequently in computing the roots of 
a polynomial ([2], [11]). Since a{z) is the product 
of t linear factors z + pi over F^m (i.e. pi is a p- 
ary pol5momial in a of degree to — 1), this factoring 
algorithm can be directly applied to separate these 
t factors. The error positions £, are then obtained by 
computing the discrete logarithm of (pi)~^ = a^* to 
base a. This task can be performed by Shank's al- 
gorithm ([27]), which we revisit below. The overall 
expected complexity of finding the error positions 
with this algorithm is 0{mt log^ t log log t) ([2]), plus 
0{t^/n), where the second addend comes from 
Shank's algorithm. Moreover, better computational 
estimates may be obtained taking into account the 
considerations and improvements highlighted in 
[8]. 

a) Cantor-Zassenhaus algorithm: The algorithm 
of Cantor-Zassenhaus ([4]) is described here for 
easy reference (see also the analysis in [8]). We 
describe only the case of characteristic 2, which is 
by far the most common in practice; the interested 
reader can find the general situation in [4], [8]. 
Assume that p{z) is a polynomial over that is a 
product of t polynomials of degree 1 over the same 
field ¥2^, TO even (when to is odd it is enough to 
consider a quadratic extension and proceed as in 
the case of even m). Suppose that a is a known 



primitive element in ¥2^, and set £m = ^"3^"^ , then 
p = a^™ is a primitive cubic root in F2™, so that p 
is a root oi z"^ + z + 1. The algorithm consists of the 
following steps: 

1) Generate a random poljmiomial b{z) of degree 
not greater than t—1 over ¥2^ ■ 

2) Compute a{z) = h{zY"^ modp(2;). 

3) IF a{z) ^ 0, l,p,p2, THEN at least a polyno- 
mial among 

gcd{p(z), a{z)}, gcA{p{z),a{z) + 1}, 
gcd{p(z), a{z) + p}, gcd{p(z), a(z) + p^} 

will be a non trivial factor of p{z), ELSE repeat 
from point 1. 

4) Iterate until all linear factors of p{z) are found. 

b) Remark 1: As shown in [8], the polynomial 
b{z) can be conveniently chosen of the form z + (3, 
using b{z) = 2; as initial choice. Let ^ be a generator 
of the cyclic subgroup of FJ™ of order £m- If z^"" = 
p^ mod <j{z), i G {0, 1, 2}, then each root of a{z) 
is of the form If this is the case, which does 
not allow us to find a factor, we repeat the test with 
b{z) = z + [3 for some /3 and we will succeed as soon 
as the elements Ch+P are not all of the type a^9^ for 
the same i e {0, 1, 2}. This can be shown to happen 
probabilistically, and often deterministically, very 
soon, expecially when the degree of a{z) is high. In 
most practical situations it is actually very seldom 
that more than two iterations are needed, which 
explains its widespread use. 

c) Shank's algorithm: Shank's algorithm can be 
applied to compute the discrete logarithm in a 
group of order n generated by the primitive ele- 
ment a. The exponent £ in the equality 

a^ = bo + bia-\ \-bs-ia''~^ . 

is written in the form £ = £0 + £1 [y^l • A. table T 
is constructed with \y/n\ entries a^^ which are 
sorted in some well defined order, then a cycle of 
length l\/n] is started computing 

Aj = {bo+bia+- ■ ■+bs-ia'-'^)a-^ j = 0, . . . , \V^]-1 , 

and looking for Aj in the Table; when a match 
is found with the K-th entry, we set £0 = j and 
£1 = K, and the discrete logarithm £ is obtained as 

j + k\^]. 

This algorithm can be performed with complexity 
0{^/n) both in time and space (memory). In our 
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scenario, since we need to compute t roots, the 

complexity is 0{t^). 

d) Remark 2: The Cantor-Zassenhaus algorithm 
finds the roots Xj = a^^ of the reciprocal of the 
error locator polynomial, then the baby-step giant- 
step algorithm of Shank's finds the error positions 
ijS. As said in the introduction, this is the end of the 
decoding process for binary codes. For non-binary 
codes, Forney's polynomial r(x) — a{x){S{x) + 
1) mod where S{x) = Y.T=i SiX' ([28]), yields 
the error values 



where 3 errors occurred. The 6 S5mdromes are 



-X, 



V'(x-i) 



Again we remark that this last step can benefit from 
an efficient polynomial evaluation algorithm, such 
as the one discussed in Section 2. 

e) Remark 3: We observe that the above proce- 
dure can be used to decode beyond the BCH bound, 
up to the minimum distance, whenever the error 
locator polynomial can be computed from a full set 
of syndromes ([5], [9], [25], [28]). 

IV. A NUMERICAL EXAMPLE 

In the previous sections we presented methods 
to compute syndromes and error locations in the 
GFZ decoding scheme of cyclic codes up to their 
BCH boimd, which are asjonptotically better than 
the classical algorithms. The following example 
illustrates the complete new procedure. 
Consider a binary BCH code [63, 45, 7] with gener- 
ator polynomial 

g{x) = x^^ + a;" -|- a;" + x^^ + x^ + x"^ + x^ + x^ + 1 
whose roots are 



,,2 „4 



16 „32 „3 „6 „12 



r s. 






+ a 




= sl 






< ^3 


= a^- 




+ + 0^ + a 


^4 


= Sf 








= q '^ - 


1- 


+ 1 




= si 







For example, has been computed considering 

r(x) as 

[r3,o+zr3^i+y{r3^2+zr3,3)]+x[r3^4+zr3^5+yir3,6+zr3j)], 
with y = x'^, z = x^, w = x^ and 



r-3,1 

^■3,2 
r3,3 



w 



w 



''3,4 = w'^ + 11)^ 

r3,b = +w + 1 
r3,6 = 1 

''3,7 = W'^ + + 1 

with only 16 products, namely 3 to compute a^, 
a* and q'^, 6 for the powers of w up to w'^ and 7 
multiplications by x, y and z. 

The coefficients of the error locator polynomial 
turn out to be 



CTi = a 
(72 = 

4 

(73 = a* 



a + a 



The roots of cr*(z) = z^g{z^^) = Y^i^i{z - a^') are 
computed as follows using the Cantor-Zassenhaus 
algorithm. 



Let p 



1,21 



be a cube root of the unity; consider 



a random pol5momial, for instance 2; -I- p, of degree 
less than 3 and compute a{z) = {z + p)"^^ modulo 
cr*(z) (the exponent of z + p is ^"^^^ = T = 21): 

{a^+a^+a^+a+l)z'^+{a^+a+l)z+a^+a^+x^+l . 



a,a ,a ,a ,a ,a ,a',a,a 

a^^ a\ a'°, a^°, a«, a'', 



thus the BCH bound is 7. Let c{x) = g{x)I{x) be a 
transmitted code word, and the received word be 

r(x) = x''+x''+x''+x''^+x'"+x''''+x'''+x^^+x'^+ 
x^"" + x^' +x^' + x^' + x'* + x'^ + x' + x' + x^ + 1 



In this case a{z) has no root in common with cr*{z), 
while 

gcd{a{z) + l.a*{z)) = z+{a'^ + a^ + l) (£1 = 31), 

gcd{a{z)+p,a*{z)) = z+{a^ +a^+a'^ + 1) {£2 = 9), 

gcd(a(2) + p^,a*{z)) = z + {a^ + a) (£3 = 50). 

The error positions have been obtained using 
Shank's algorithm with a table of 8 entries, and 
a loop of length 8 for each root, for a total of 24 
searches versus 63 searches of Chien's search. 
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V. Concluding remarks 

A new decoding algorithm for cyclic codes has 
been presented having a very competitive complex- 
ity and targeting in particular those applications 
using error correcting codes with very large length. 
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